Goto

Collaborating Authors

 regularizing trajectory optimization




Reviews: Regularizing Trajectory Optimization with Denoising Autoencoders

Neural Information Processing Systems

The paper addresses the problem of reducing the exploitation of inaccuracies of learned dynamics models by trajectory optimization algorithms in model-based Reinforcement Learning. For this, it proposes to add a regularizer to the optimization cost which writes as an estimation of the log probability (in a local window) of sampling the optimized trajectory from the distribution of known trajectories. The idea is to avoid trajectories deviating too much from the data used to learn the dynamics model, and hence avoid unreliable solutions. The authors propose to estimate the log probability term with a denoising autoencoder network. They provide multiple experiments comparing their method to other state-of-the-art approaches on known environments/datasets.


Reviews: Regularizing Trajectory Optimization with Denoising Autoencoders

Neural Information Processing Systems

Reviewers find adding DAE style regularization in trajectory optimization phase of model-based RL interesting and appreciate the writing and execution of the paper. Reviewers though expressed concerns regarding the novelty of the work (a straightforward application of existing method) and would like to see more experiments demonstrating the effectiveness of proposed method under different dynamic models. Connection to behavior cloning and off-policy learning in model-free cases should be of interest to discuss. Overall, reviewers lean toward accepting the paper, we thus decided to accept it as is. Please address reviewers' comments in your final draft.


Regularizing Trajectory Optimization with Denoising Autoencoders

Neural Information Processing Systems

Trajectory optimization using a learned model of the environment is one of the core elements of model-based reinforcement learning. This procedure often suffers from exploiting inaccuracies of the learned model. We propose to regularize trajectory optimization by means of a denoising autoencoder that is trained on the same trajectories as the model of the environment. We show that the proposed regularization leads to improved planning with both gradient-based and gradient-free optimizers. We also demonstrate that using regularized trajectory optimization leads to rapid initial learning in a set of popular motor control tasks, which suggests that the proposed approach can be a useful tool for improving sample efficiency.


Regularizing Trajectory Optimization with Denoising Autoencoders

Boney, Rinu, Palo, Norman Di, Berglund, Mathias, Ilin, Alexander, Kannala, Juho, Rasmus, Antti, Valpola, Harri

Neural Information Processing Systems

Trajectory optimization using a learned model of the environment is one of the core elements of model-based reinforcement learning. This procedure often suffers from exploiting inaccuracies of the learned model. We propose to regularize trajectory optimization by means of a denoising autoencoder that is trained on the same trajectories as the model of the environment. We show that the proposed regularization leads to improved planning with both gradient-based and gradient-free optimizers. We also demonstrate that using regularized trajectory optimization leads to rapid initial learning in a set of popular motor control tasks, which suggests that the proposed approach can be a useful tool for improving sample efficiency.


Regularizing Trajectory Optimization with Denoising Autoencoders

Boney, Rinu, Di Palo, Norman, Berglund, Mathias, Ilin, Alexander, Kannala, Juho, Rasmus, Antti, Valpola, Harri

arXiv.org Machine Learning

Trajectory optimization with learned dynamics models can often suffer from erroneous predictions of out-of-distribution trajectories. We propose to regularize trajectory optimization by means of a denoising autoencoder that is trained on the same trajectories as the dynamics model. We visually demonstrate the effectiveness of the regularization in gradient-based trajectory optimization for open-loop control of an industrial process. We compare with recent model-based reinforcement learning algorithms on a set of popular motor control tasks to demonstrate that the denoising regularization enables state-of-the-art sample-efficiency. We demonstrate the efficacy of the proposed method in regularizing both gradient-based and gradient-free trajectory optimization.